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t-H Abstract. The Traveling Salesman Problem is one of the most studied 

problems in computational complexity and its approximability has been 

C^l a long standing open question. Currently, the best known inapproxima- 

(— | bility threshold known is ||| due to Papadimitriou and Vempala. Here, 

^"t using an essentially different construction and also relying on the work 

^~ 5 of Berman and Karpinski on bounded occurrence CSPs, we give an al- 

^vj ternative and simpler inapproximability proof which improves the bound 

-J to ^ 

T— I 184- 

O 1 Introduction 

u 

^ The Traveling Salesman Problem (TSP) is one of the most widely studied 

algorithmic problems and deriving optimal approximability results for it 
has been a long-standing question. Recently, there has been much progress 
in the algorithmic front, after more than thirty years, at least in the impor- 
tant special case where the instance metric is derived from an unweighted 
graph, often referred to as Graphic TSP. The |-approximation algorithm 
by Christofides was the best known until Gharan et al. gave a slight im- 
provement [6j for Graphic TSP. Then an algorithm with approximation 
ratio 1.461 was given by Momke and Svensson [9]. With improved anal- 
CN| ysis on their algorithm Mucha obtained a ratio of |10| . while the best 

currently known algorithm has ratio 1.4 and is due to Sebo and Vygen 

> m 

^ Nevertheless, there is still a huge gap between the guarantee of the 

5— i best approximation algorithms we know and the best inapproximability 

results. The TSP was first shown MAXSNP-hard in |14| . where no explicit 
inapproximability constant was derived. The work of Engerbretsen [5] and 
Bockenhauer et al. |3j gave inapproximability thresholds of and 
respectively. Later, this was improved to ||g in [13] by Papadimitriou and 
VempalsQ No further progress has been made on the inapproximability 
threshold of this problem in the more than ten years since |12J . 
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1 The reduction of |13) was first presented in |12| , which (erroneously) claimed a better 
bound. 



Overview: Our main objective in this paper is to give a different, less 
complicated inapproximability proof for TSP than the one given in |12|13| . 
The proof of |13] is very much optimized to achieve a good constant: 
the authors reduce directly from MAX-E3-LIN2, a constraint satisfaction 
problem (CSP) for which optimal inapproximability results are known, 
due to Hastad [7]. They take care to avoid introducing extra gadgets for 
the variables, using only gadgets that encode the equations. Finally they 
define their own custom expander-like notion on graphs to ensure consis- 
tency between tours and assignments. Then the reduction is performed in 
essentially one step. 

Here on the other hand we take the opposite approach, choosing 
simplicity over optimization. We also start from MAX-E3-LIN2 but go 
through two intermediate CSPs. The first step in our reduction gives a set 
of equations where each variable appears at most five times (this prop- 
erty will come in handy in the end when proving consistency between 
tours and assignments). In this step, rather than introducing something 
new we rely heavily on machinery developed by Berman and Karpinski to 
prove inapproximability for bounded occurrence CSPs [T|2|3] . As a second 
step we reduce to MAX-1-IN-3-SAT. The motivation is that the 1-IN-3 
predicate nicely corresponds to the objectives of TSP, since we represent 
clauses by gadgets and the most economical solution will visit all gadgets 
once but not more than once. Another way to view this step is that we use 
MAX-1-IN-3-SAT as an aid to design a TSP gadget for parity. Finally, 
we give a reduction from MAX-1-IN-3-SAT to TSP. 

This approach is (at least arguably) simpler than the approach of 
|13| . since some of our arguments can be broken down into independent 
pieces, arguing about the inapproximability of intermediate, specially con- 
structed CSPs. We also benefit from re-using out-of-the box the amplifier 
construction of 3|. Interestingly, putting everything together we end up 
obtaining a slightly better constant than the one currently known, imply- 
ing that there may still be some room for further improvement. Though we 
are still a long way from an optimal inapproximability result, our results 
show that there may still be hope for better bounds with existing tools. 
Exploring how far these techniques can take us with respect to TSP (and 
also its variants, see for example [8]) may thus be an interesting question. 

The main result of this paper is given below and it follows directly 
from the construction in section 4.1 and Lemmata T]2 



Theorem 1. For alle > there is no polynomial-time (j^ff - e)- approximation 
algorithm for TSP, unless P—NP. 



2 Preliminaries 



We will denote graphs by G(V, E). All graphs are assumed to be undi- 
rected, loop-less and edge- weighted, meaning that there is also a function 
w : E — > R + . In some cases we will allow £ to be a multi-set, that is, 
we may allow parallel edges. In the case of a multi-set E that contains 
several copies of some elements, when we write XyeS-E^(^) mean the 
sum that has one term for each copy. A (multi-)graph is Eulerian if there 
exists a closed walk that visits all its vertices and uses each edge once. It 
is well known that a (multi-)graph is Eulerian iff it is connected and all its 
vertices have even degree. We will use [n] to denote the set {1,2, ... ,n}. 
We will use E[X] to denote the expectation of a random variable X. 

In the metric Traveling Salesman Problem (TSP) we are given as input 
an edge- weighted undirected graph G(V,E). Let d(u,v), for u, v G V 
denote the shortest-path distance from u, v. The objective is to find an 
ordering v\, V2, ■ ■ ■ , v n of the vertices such that Y^i=i d(vi, Vi+i) + d(v n , v±) 
is minimized. 

Another, equivalent view of the TSP is the following: given an edge- 
weighted graph G(V,E) we seek to find a multi-set Et consisting of edges 
from E such that the graph induced by Et spans V, is Eulerian and the 
sum of the weights of all edges in Et is minimized. It is not hard to see 
that the two formulations are equivalent. We will make use of this second 
formulation because it makes some arguments on our construction easier. 

We generalize the Eulerian multi-graph formulation as follows: a multi- 
set Et of edges from E is a quasi-tour iff the degrees of all vertices in the 
multi-graph Gt{V,Et) are even. The cost of a quasi-tour is defined as 
J2eeE T w ( e ) +2(c(Gt) — 1), where c(Gt) denotes the number of connected 
components of the multi-graph. It is not hard to see that a TSP tour can 
also be considered a quasi-tour with the same cost (since for a normal 
tour c(Gt) = 1), but in a weighted graph there could potentially be a 
quasi-tour that is cheaper than the optimal tour. 

2.1 Forced edges 

As mentioned, we will view TSP as the problem of selecting edges from E 
to form a minimum-weight multi-set Et that makes the graph Eulerian. 
It is easy to see that no edge will be selected more than twice, since if an 
edge is selected three times we can remove two copies of it from Et and 
the graph will still be Eulerian while we have improved the cost. 

In our construction we would like to be able to stipulate that some 
edges are to be used at least once in any valid tour. We can achieve this 



with the following trick: suppose that there is an edge (u, v) with weight 
w that we want to force into every tour. We sub-divide this edge a large 
number of times, say p — 1, that is, we remove the edge and replace it 
with a path of p edges going through new vertices of degree two. We then 
redistribute the original edge's weight to the p newly formed edges, so 
that each has weight w/p. Now, any tour that fails to use two or more 
of the newly formed edges must be disconnected. Any tour that fails to 
use exactly one of them can be augmented by adding two copies of the 
unused edge. This only increases the cost by 2w/p, which can be made 
arbitrarily small by giving p an appropriately large value. Therefore, we 
may assume without loss of generality that in our construction we can 
force some edges to be used at least once. Note that these arguments 
apply also to quasi-tours. 

3 Intermediate CSPs 

In this section we will design and prove inapproximability for a family of 
instances of MAX-1-IN-3-SAT with some special structure. We will use 
these instances (and their structure) in the next section where we reduce 
from MAX-1-IN-3-SAT to TSP. 

Let I\ be a system of m linear equations mod 2, each consisting of 
exactly three variables. Let n be the total number of variables appearing 
in I\ and let the variables be denoted as Xi, i G [n]. Let B be the maximum 
number of times any variable appears. We will make use of the following 
seminal result due to Hastad: 

Theorem 2 Q7J). 

For all e > there exists a B such that given an instance I\ as above 
it is NP-hard to decide if there is an assignment that satisfies at least 
(1 — e)m equations or all assignment satisfy at most (| + e)m equations. 

3.1 Bounded Occurences 

In I\ each variable appears at most a constant number of times B, where 
B depends on e. We would like to reduce the maximum number of occur- 
rences of each variable to a small absolute constant. For this, one typically 
uses some kind of expander or amplifier construction. Here we will rely 
on a construction due to Berman and Karpinski that reduces the number 
of occurrences to 5. 



Theorem 3 Q3J). 



Consider the family of bipartite graphs G(L, R, E), where \L\ = B, \R\ = 
0.8B, all vertices of L have degree 4, all vertices of R have degree 5 and 
B is a sufficiently large multiple of 5. If we select uniformly at random a 
graph from this family then with high probability it has the following prop- 
erty: for any S C LL) R such that \S D L\ < ^ the number of edges in E 
with exactly one endpoint in S is at least |iSn.L|. 

We now use the above construction to construct a system of equations 
where each variable appears exactly 5 times. First, we may assume that in 
I\ the number of appearances of each variable is a multiple of 5 (otherwise, 
repeat all equations five times). Also, by repeating all the equations we 
can make sure that all variables appear at least B' times, where B' is a 
sufficiently large number to make Theorem [3] hold. 

For each variable x% in I\ we introduce the variables xuj-\,j G [d(i)] 
and yaj\,j G [0.8d(i)] where d(i) is the number of appearances of Xi in the 
original instance. We call Xi = {xufi \ j G [cZ(z)] } U {yu t j) \ j G [0.8d(i)]} 
the cloud that corresponds to X{. Construct a bipartite graph with the 
property described in Theorem [3] with L = [d(i)],R = [0.8d(i)] (since 
d(i) < B is a constant that depends only on e this can be done in constant 
time by brute force). For each edge (j,k) G E introduce the equation 
xuj\ +V(i^k) = 1- Finally, for each equation x^ +Xj 2 +Xj 3 = b in Ji, where 
this is the ji-th appearance of Xi x , the j2~th appearance of Xi 2 and the j'3-th 
appearance of Xi 3 replace it with the equation x^j^+xr^j^+xr^^ = b. 

Denote this instance by I2 and we have I/2I = 13m, with 12m equa- 
tions having size 2. A consistent assignment to a cloud Xi is an assignment 
that sets all xuj^ to b and all yuj\ to 1 — 6. By standard arguments using 
the graph of Theorem [3] we can show that an optimal assignment to I2 
is consistent (in each inconsistent cloud let S be the vertices with the 
minority assignment; flipping all variables of S cannot make the solution 
worse). From this it follows that it is NP-hard to distinguish if the max- 
imum number of satisfiable equations is at least (13 — e)m or at most 
(12.5 + e)m. 

3.2 MAX-1-IN-3-SAT 

In the MAX-1-IN-3-SAT problem we are given a collection of clauses 
(li V lj V Ik), each consisting of at most three literals, where each literal is 
either a variable or its negation. A clause is satisfied by a truth assighment 
if exactly one of its literals is set to True. The problem is to find an 
assignment that satisfies the maximum number of clauses. 



We would like to produce a MAX-1-IN-3-SAT instance from I2. Ob- 
serve that it is easy to turn the size two equations + 2/(j,fc) = 1 to 
the equivalent clauses {xuj\ Vj/yi). We only need to worry about the m 
equations of size three. 

If the fc-th size-three equation of I2 is xr^j^ + Xu 2 j 2 \ + ^(i 3 j 3 ) = 1 
we introduce three new auxilliary variables ct/^j) ,t£ [3] and replace the 
equation with the three clauses (x^ 1 j 1 ) V ci(k,i) V a (k,2))-> ( x (i 2 ,j2) ^ a (k,2) V 
a (fc,3))> ( x (i 3 ,j3) V a (fc,i) V a(fc,3))- If the right-hand-side of the equation is 
then we add the same three clauses except we negate xt^j^ in the first 
clause. We call these three clauses the cluster that corresponds to the fc-th 
equation. 

It is not hard to see that if we fix an assignment to x^ j^, 37j 2) j 2 ), xu 3 j 3 \ 
that satisfies the fc-th equation of I2 then there exists an assignment to 
a (fc,i)' a (k,2)i a (fc,3) that satisfies the whole cluster. Otherwise, at most two 
of the clauses of the cluster can be satisfied. Furthermore, in this case 
there exist three different assignments to the auxilliary variables that sat- 
isfy two clauses and each leaves a different clause unsatisfied. 

From now on, we will denote by M the set of (main) variables by 
C the set of (checker) variables yuj) and by A the set of (auxilliary) vari- 
ables fl(fc 5 j). Call the instance of MAX-1-IN-3-SAT we have constructed 
I3. Note that it consists of 15m clauses and 8.4m variables. 

4 TSP 

4.1 Construction 

We now describe a construction that encodes ^3 into a TSP instance 
G(V, E). Rather than viewing this as a generic construction from MAX- 
1-IN-3-SAT to TSP, we will at times need to use facts that stem from the 
special structure of I3. In particular, the fact that variables can be parti- 
tioned into sets M, C, A, such that variables in M U C appear five times 
and variables in A appear twice; the fact that most clauses have size two 
and they involve one positive variable from M and one positive variable 
from C; and also the fact that clauses of size three come in clusters as 
described in the construction of I3. 

As mentioned, we assume that in the graph G(V, E) we may include 
some forced edges, that is, edges that have to be used at least once in any 
tour. The graph includes a central vertex, which we will call s. For each 
variable ina;£MUCUiwe introduce two new vertices named x L and 
x R , which we will call the left and right terminal associated with x. We 
add a forced edge from each terminal to s. For terminals that correspond 



False 



Fig. 1. Example construction for the clause (xVy)A(xVz). Forced edges are denoted by 
dashed lines. There are two terminals for each variable and two gadgets that represent 
the two clauses. The True edges incident on the terminals are re-routed through the 
gadgets where each variable appears positive. The False edges connect the terminals 
directly since no variable appears anywhere negated. 



to variables inMUC this edge has weight 7/4, while for variables in A it 
has weight 1/2. We also add two (parallel) non-forced edges between each 
pair of terminals representing the same variable, each having a weight of 
1 (we will later break down at least one from each pair of these, so the 
graph we will obtain in the end will be simple). Informally, these two edges 
encode an assignment to each variable: we arbitrarily label one the True 
edge and the other the False edge, the idea being that a tour should pick 
exactly one of these for each variable and that will give us an assignment. 
We will re-route these edges through the clause gadgets as we introduce 
them, depending on whether each variable appears in a clause positive or 
negative. 

Now, we add some gadgets to encode the size-two clauses of I3. Let 
( x (i,ji) V y(i,j2)) ^ e a ciause °f ^3 an d suppose that this is the fci-th clause 
that contains xnj^ and the &2-th clause that contains yu ; j 2 ), &i,&2 £ 
[5]. Then we add two new vertices to the graph, call them Xuj^ an d 
V(i j 2 y Add two forced edges between them, each of weight 3/2 (recall that 
forced edges represent long paths, so these are not really parallel edges). 
Finally, re-route the True edges incident on xk and through x^s 

and V*Uj 2 \ respectively. More precisely, if the True edge incident on x^ 
connects it to some other vertex u, remove that edge from the graph and 
add an edge from xf- • \ to x? 1 . * and an edge from x^ 1 . , to 11, All these 

to (Ml) (Ml) to Ml) 

edges have weight one and are non- forced (see Figure [T| . 



We use a similar gadget for clauses of size three. Consider a cluster 

( X (iiji) Va (fc,l) Vfl (fc,2))' ( X fej2) Va (fc,2) Va (M))' ( X fe,j 3 ) Va (fe,l) Va (fc,3)) and 

suppose for simplicity that this is the fifth appearance for all the main vari- 
ables of the cluster. Then we add the new vertices Xu ■ \,x,- ■ \,x\ ■ % 

('Ul)' («2J2)' («3J3) 

and also the vertices ah ^, ah ^, ah 2 y ah 2 ^ and a^ k 3 ), a^, 3 y To encode 
the first clause we add two forced edges of weight 5/4, one from xh to 
',/,.!) ^uuacuuui^^., ,u ..,,,,2,. '•' " «' 1 ' 

l (k,l) to fl (fc,2)' 

2 ). We re-route the True edge from ah ^ through ah ^ and a^ k ^. We 



a^ k ^ and one from a& to We also add a forced edge of weight 1 

from ah y, to ah 2 -,, thus making a triangle with the forced edges (see Fig- 



ure 

do similarly for the other two auxilliary variables and the main variables. 
Finally for a cluster where x^j^ is negated, we use the same construc- 
tion except that rather than re-routing the True edge that is incident on 
x (h h) we re ~ rou t e the False edge. This completes the construction. 




Fig. 2. Example construction fragment for the cluster (x\ V a\ V 02) A (X2 V 02 V 03) A 
(x3 V % V 03). The False edges which connect each pair of terminals and the forced 
edges that connect terminals to s are not shown. 



4.2 From Assignment to Tour 

Let us now prove one direction of the reduction and in the process also give 
some intuition about the construction. Call the graph we have constructed 
G(V,E). 

Lemma 1. If there exists an assignment to the variables of I3 that leaves 
at most k equations unsatisfied, then there is a tour of G with cost at most 
T = L + k, where L = 91.8m. 

Proof. Observe that by construction we may assume that all the unsatis- 
fied clauses of ^3 are in the clusters and that at most one clause in each 
cluster is unsatisfied, otherwise we can obtain a better assignment. Also, 



if an unsatisfied clause has all literals set to False we can flip the value of 
one of the auxilliary variables without increasing the number of violated 
clauses. Thus, we may assume that all clauses have a True literal. Also, 
we may assume that no clause has all literals set to True: suppose that 
a clause does, then both auxilliary variables of the clause are True. We 
set them both to False, gaining one clause. If this causes the two other 
clauses of the cluster to become unsatisfied, set the remaining auxilliary 
variable to True. We conclude that all clauses have either one or two True 
literals. 

Our tour uses all forced edges exactly once. For each variable x set 
to True in the assignment the tour selects the True edge incident on the 
terminal corresponding to x. If the edge has been re-routed all its pieces 
are selected, so that we have selected edges that make up a path from x L 
to x R . Otherwise, if x is set to False in the assignment the tour selects 
the corresponding False path. 

Observe that this is a valid quasi-tour because all vertices have even 
degree (for each terminal we have selected the forced edge plus one more 
edge, for gadget vertices we have selected the two forced edges and pos- 
sibly the two edges through which True or False was re-routed). Also, 
observe that the tour must be connected, because each clause contains a 
True literal, therefore for each gadget two of its external edges have been 
selected and they are part of a path that leads to the terminals. 

The cost of the tour is at most F + N + M + k, where F is the total 
cost of all forced edges in the graph and N, M are the total number of 
variables and clauses respectively in I3. To see this, notice that there are 
2N terminals, and there is one edge incident on each and there are M 
clause gadgets, M — k of which have two selected edges incident on them 
and k of which have four. Summing up, this gives 2N + 2M + 2k, but 
then each unit-weight edge has been counted twice, meaning that the 
non-forced edges have a total cost of N + M + k. 

Finally, we have N = 8.4m, M = 15m and F = 3 x 12m + 3 x 
3m + I x 5.4m + 1 x 3m = 68.4m, where the terms are respectively the 
cost of size- two clause gadgets, the cost of size-three clause gadgets, the 
cost of edges connecting terminals to s for the main variables and for the 
auxilliary variables. We have F + N + M = 91.8m. □ 

4.3 From Tour to Assignment 

We would like now to prove the converse of Lemma [TJ namely that if a 
tour of cost L + k exists then we can find an assignment that leaves at 



most k clauses unsatisfied. Let us first give some high-level intuition and 
in the process justify the weights we have selected in our construction. 

Informally, we could start from a simple base case: suppose that we 
have a tour such that all edges of G are used at most once. It is not hard 
to see that this then corresponds to an assignment, as in the proof of 
Lemma [T] So, the problem is how to avoid tours that may use some edges 
twice. 

To this end, we first give some local improvement arguments that 
make sure that the number of problematic edges, which are used twice, 
is limited. However, arguments like these can only take us so far, and we 
would like to avoid having too much case analysis. 

We therefore try to isolate the problem. For variables in M U C which 
the tour treats honestly, that is, variables which are not involved with 
edges used twice, we directly obtain an assignment from the tour. For 
the other variables in M U C we pick a random value and then extend 
the whole assignment to A in an optimal way. We want to show that the 
expected number of unsatisfied clauses is at most k. 

The first point here is that if a clause containing only honest variables 
turns out to be violated, the tour must also be paying an extra cost for it. 
The difficulty is therefore concentrated on clauses with dishonest variables. 

By using some edges twice the tour is paying some cost on top of what 
is accounted for in L. We would like to show that this extra cost is larger 
than the number of clauses violated by the assignment. It is helpful to 
think here that it is sufficient to show that the tour pays an additional 
cost of | for each dishonest variable, since main variables appear 5 times. 

A crucial point now is that, by a simple parity argument, there has 
to be an even number of violations (that is, edges used twice) for each 
variable (Lemma This explains the weights we have picked for the 
forced edges in size-three gadgets (§) and for edges connecting terminals 
tos(| = | + Hor| extra to the cost already included in L for fixing 
the parity of the terminal vertex). Two such violations give enough extra 
cost to pay for the expected number of unsatisfied clauses containing the 
variable. 

At this point, we could also set the weights of forced edges in size- two 
gadgets to |, which would be split among the two dishonest variables 
giving | to each. Then, any two violations would have enough additional 
cost to pay for the expected unsatisfied clauses. However, we are slightly 
more careful here: rather than setting all dishonest variables in M U C 
independently at random, we pick a random but consistent assignment 
for each cloud. This ensures that all size-two clauses with violations will 



be satisfied. Thus, it is sufficient for violations in them to have a cost of 
|: the amount "paid" to each variable is now | = § — i, but the expected 
number of unsatisfied clauses with this variable is also decreased by ^ 
since one clause is surely satisfied. 

Let us now proceed to give the full details of the proof. Recall that if 
a tour of a certain cost exists, then there exists also a quasi-tour of the 
same cost. It suffices then to prove the following: 

Lemma 2. // there exists a quasi-tour of G with cost at most L + k then 
there exists an assignment to the variables of 1% that leaves at most k 
clauses unsatisfied. 

In order to prove Lemma [2] it is helpful to first make some easy obser- 
vations. First, observe that if a quasi-tour uses a unit- weight edge twice 
then we can remove both of these appearances of the edge from the so- 
lution without increasing the cost, since the number of components can 
only increase by one. Therefore, all (non-forced) edges of weight one are 
used at most once. 

Second, if both forced edges of a gadget of size two are used twice then 
we can remove one appearance of each from the solution, decreasing the 
cost. Similarly, in a gadget of size three if two forced edges are used twice 
then we can drop one copy of each and use the third edge twice, making 
the tour cheaper. Therefore, in each gadget there is at most one forced 
edge that is used twice. 

Third, if both forced edges that connect the terminals x L , x R to s are 
used twice, then we can remove one appearance of each from the solution 
and replace them by the shortest path from x L to x R that uses only non- 
forced unit weight edges. This has weight at most one for the auxilliary 
variables and two for the rest, which in both cases is at most as much as 
the weight of the removed edges. Therefore, for each variable x, at least 
one of the forced edges that connect x L , x R to s is used exactly once. 

Given a tour Er, we will say that a variable x is honestly traversed in 
that tour if all the forced edges that involve it are used exactly once (this 
includes the forced edges incident on x L ,x R and x l , i G [5]). 

Let us now give two more useful facts. 

Lemma 3. There exists an optimal tour where all forced edges between 
two different vertices that correspond to two variables in A are used exactly 
once. 

Proof. We refer the reader again to Figure [2] Suppose for contradiction 
that the edge (a},^) is used twice (the other cases are equivalent by 



symmetry since all vertices a\ are connected to one terminal and one 
other such vertex). 

First, suppose that at least one of the edges that connect one of these 
two endpoints to a terminal is selected, say the edge (af , a\). Then modify 
the solution by removing that edge and a copy of the duplicate forced edge 
and adding a copy of (a^a^), (s, a^) and (s,af). This does not increase 
the cost. 

Second, suppose that both (s, of) and (s, ) are used twice in the 
tour. Then we can modify the tour by dropping one copy of each and a 
copy of the duplicate gadget edge and adding (of, a}) and (a^ , a\). 

Finally, suppose that none of the previous two cases is true. Thus, 
neither of (of, a}), (02,02) i s use d in the tour. This means that (a}, of) 
and (02,02) are both used to ensure that a}, a?, have even degree. Also, 
one of the edges connecting a terminal to s is used once, say (a, of 1 ). This 
means that the False edge incident to a\ must be used to make the degree 
of a\ even. Remove the False edge and the edge (a\, a\ ) from the tour and 
add the edges (of, a\) and (of, of). This reduces to the first case. □ 

Lemma 4. In an optimal tour, if a variable is dishonest then it must be 
dishonest twice. More precisely, the number of forced edges that involve 
the variable (either inside gadgets or connecting terminals to s) and are 
used twice must be even. 

Proof. Consider a variable x and first suppose that neither of the forced 
edges connecting s to the terminals is used twice, but there is a single 
forced edge in a gadget that is used twice. It follows that the vertex that 
corresponds to x in that gadget has an odd number of unit-weight edges 
incident to it selected. The two terminals have a single selected unit-weight 
edge incident on them and all other vertices that belong to x have an even 
number of incident unit-weight edges selected, since their total degree is 
even. Thus, summing the number of selected unit-weight edges incident 
on all the vertices that belong to x we get an odd number, which is a 
contradiction since we counted each such edge exactly twice. A similar 
argument applies if one assumes that one of the forced edges incident on 
the terminals is used twice and all other forced edges are used once. □ 



Observe that it follows from Lemmata 3j4 that if all the main variables 



involved in a cluster are honest then the auxilliary variables of that cluster 
are also honest. This holds because if the main variables are honest then 
by Lemma [3] no forced edge inside the gadgets of the cluster is used twice, 
so by Lemma [4] and the fact that at least one of the forced edges incident 
on the terminals is used once, the auxilliary variables are honest. 



We would like now to be able to extract a good assignment even if a 
tour is not honest, thus indirectly proving that honest tours are optimal. 

Proof (Lemma^. 

Consider the following algorithm to extract an assignment from the 
tour: first, for each variable in M U C that was traversed honestly give it 
the same truth- value as in the tour, that is, if the tour selects the True 
edge incident on the corresponding terminal, set the variable to True, 
otherwise to False. To decide on the value of the dishonest variables from 
M U C produce n random bits bi,i £ [n] (recall that n is the number of 
variables of I\, or the number of clouds in I2). For each i set all dishonest 
variables xuj\ to be equal to bi and all dishonest V(ij) to be equal to 1 — 6j. 
This ensures that size-two clauses that contain two dishonest variables are 
always satisfied, since these clauses are always between two variables of 
the same cloud. 

Let us also assign the auxilliary variables. If there is an assignment to 
the auxilliary variables of a cluster that satisfies all three clauses select 
it. Otherwise, select an assignment that violates the clause of a dishonest 
variable from M, if such a variable exists, and satisfies the other two. If 
all main variables are honest, as we have argued the auxilliary variables 
are also honest, so pick the corresponding assignment. 

We now have a randomized assignment for I3, so let us upper-bound 
the expected number of unsatisfied clauses. Let U be a random variable 
equal to the set of unsatisfied clauses and let U = U\ U U2 where U\ 
contains all the unsatisfied clauses that involve only honest variables from 
M L) C and U2 the rest. (Note that U\ is not random.) 

The cost of the quasi-tour we have is T < F + N + M + k. Let Eq 
be the set of forced gadget edges that the tour uses twice. Let Es be the 
set of forced edges incident on s that the tour uses twice. Let E\ be the 
set of unit- weight edges that the tour uses (recall that each is used once). 
Let U[ be the set of clauses that correspond to gadgets the tour visits at 
least twice (meaning they have at least four incident edges selected). Let 
U" be the set of clauses that correspond to gadgets the tour does not visit 
(meaning that each forms its own connected component). 

We have T = J2eeE T w{e) + 2(c(G T ) — 1) = F + £ eei?i w(e) + 
EeeEc He) + EetEs He) + 2(c(G T ) - 1). 

By definition J2 e &E l w ( e ) = \Ei\. Let us try to lower-bound this quan- 
tity using arguments similar to the proof of Lemma [T] After the selection 
of the forced edges there are 2N — \ Es\ terminals with odd degree, so each 
has a selected unit-weight edge incident to it. There are \U[\ gadgets with 
at least four selected incident edges and M — \ U[ \ — \ U"\ gadgets with two 



selected incident edges. Summing up we get 2N— \E$\+2M +2\U[\ — 2\U'{\, 
but each edge is counted twice, so we have \E\\ > N — \\E$\ + M+ \ U[ \ — 

MY 

Using this fact we get T > F + N + M + E e e£ G w{e) + EeeE s ( w ( e ) - 
i) + |C/(| + 2(c(G T )-l)-|^|. 

Now, observe that \U'{\ < c(Gt) — 1, because each element of U'{ forms 
a component and there is one component that is not an element of U" 
(the one that contains s). Thus, 2(c(Gt) — 1) — \U"\ > \U"\. Combining 
this with the above we get T > F + N + M + J2 e &E G w ( e ) + 2~2 e eE s ( w ( e ) ~ 
\) + W\ I + \U'{\. Given the known upper-bound on the cost of the tour 
we have that k > E e &E G w ( e ) + E ee £ s Ke) - \) + \U[\ + \U'{\. 

We now need to argue two facts and we are done. First \U\\ < \U[ \ + 
\U'{\. Recall that U\ is the set of unsatisfied clauses that involve honest 
variables. Since the variables are traversed honestly their corresponding 
gadgets are either visited at least twice or not at all, so they are counted 
in |*7(| or in \U'{\. 

Second, we would like to show that E[| U2 \] < J2eeE G u; ( e )+See-B g ( tt; ( e ) — 

Before we do that, observe that if we show this then it follows that 
E[|{7|] = Efj^l] + \Ui\ < k, so there must exist an assignment that leaves 
no more than k clauses unsatisfied and we are done. 

So, let us try to upper-bound Efl^l], which is the expected number of 
unsatisfied clauses that contain a dishonest variable. First, observe that if 
there are dishonest auxilliary variables in a cluster by the construction of 
the assignment we have ensured that any unsatisfied clause must contain 
a dishonest main variable. Therefore, it suffices to count the expected 
number of unsatisfied clauses that contain a dishonest main variable. 

Let us define a credit cr(x) for each dishonest main variable x. If a 
forced edge connecting a terminal to s is used twice we give x a credit 
of 5/4 (which is equal to w(e) — |, since these edges have weight |). If a 
forced edge in a gadget that involves x and another main variable is used 
twice we give x a credit of | (which is equal to w(e)/2). Finally, if a forced 
edge in a gadget that involves x and an auxilliary variable is used twice 
we give x a credit of | (which is equal to w(e)). We define cr(x) to be the 
sum of credits given to x in this process. 

If D is the set of dishonest main variables then it is not hard to see 
that J2 X £D cr(x) < Y. e &E G w ( e ) +J2 e &E s ( w ( e ) ~ 5)- All edges are counted 
once in the sum of credits, except for those from Eq that involve two main 
variables, for which each is credited half the weight. 

We will now argue that the expected number of unsatisfied clauses 
that contain a variable x is at most cr{x). Recall that clauses containing 



x and another dishonest main variable are by construction satisfied, while 
clauses made up of x and one honest variable are satisfied with probability 
1/2. Also, clauses of size 3 that contain x are satisfied with probability at 
least 1/2, since with probability 1/2 the equation from which the cluster 
was obtained is satisfied. Thus, if cr(x) >l we are done. We know that 
x received at least two credits by Lemma Ul so cr(x) > |, as the smallest 
credit is |. If cr(x) = | then x must have received two credits that were 
shared with other dishonest variables. Therefore, there are two clauses 
containing x which are surely satisfied, and out of the other three the 
expected number of unsatisfied clauses is | < cr(x). Similarly, if cr(x) = 2, 
then x shared a credit with another variable at least once, so one clause 
is surely satisfied and the expected number of unsatisfied clauses out of 
the other four is 2. 

We therefore have B[\U 2 \] < Ex-eD cr <X> < J2 e &E G w ( e )+J2 e &E s ( w ( e )- 
|) and this concludes the proof. □ 

5 Conclusions 

We have given an alternative and (we believe) simpler inapproximability 
proof for TSP, also modestly improving the known bound. We believe 
that the approach followed here where the hardness proof goes explicitly 
through bounded occurrence CSPs is more promising than the somewhat 
ad-hoc method of |13j . not only because it is easier to understand but also 
because we stand to gain almost "automatically" from improvements in 
our understanding of the inapproximability of bounded occurrence CSPs. 
In particular, though we used the 5-regular amplifiers from [3], any such 
amplifier would work essentially "out of the box", and any improved con- 
struction could imply an improvement in our bound. Nevertheless, the 
distance between the upper and lower bounds on the approximability of 
TSP remains quite large and it seems that some major new idea will be 
needed to close it. 
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